Here we go reading up all my tables and merging ’em together. I haven’t incorporated read() or the join function yet, but its faster cause smaller files:
PB1 <- merge_all(list(read.csv(file='H2N2PB1.tidy.csv', head=TRUE, sep='\t'),
read.csv(file='H1N1PB1.tidy.csv', head=TRUE, sep='\t'),
read.csv(file='H3N2PB1.tidy.csv', head=TRUE, sep='\t')),
all.x=TRUE, all.y=TRUE)
codonvalues <- read.csv(file='codonvalues.csv', head=TRUE, sep='\t')
Now I gotta table with: strain, sero, year, coryear, codon, numcodon, numaa. I also have another table holding a bunch of values associated with each codon.
Now I can plot!
Number of each codon found in strain
ggplot(PB1, aes(x=coryear, y=(numcodon), color=sero)) +
geom_point() +
geom_smooth(method='lm', se=FALSE) +
background_grid(major = "xy", minor = "none") +
facet_wrap(~codon)
Proportion of each codon used for its amino acid in strain
ggplot(PB1, aes(x=coryear, y=(numcodon/numaa), color=sero)) +
geom_point() +
geom_smooth(method='lm', se=FALSE) +
background_grid(major = "xy", minor = "none") +
facet_wrap(~codon)
## Warning: Removed 1 rows containing missing values (stat_smooth).
## Warning: Removed 1 rows containing missing values (stat_smooth).
## Warning: Removed 1 rows containing missing values (stat_smooth).
## Warning: Removed 1 rows containing missing values (stat_smooth).
## Warning: Removed 1 rows containing missing values (stat_smooth).
## Warning: Removed 1 rows containing missing values (stat_smooth).
## Warning: Removed 1 rows containing missing values (stat_smooth).
## Warning: Removed 1 rows containing missing values (stat_smooth).
## Warning: Removed 1 rows containing missing values (stat_smooth).
## Warning: Removed 1 rows containing missing values (stat_smooth).
## Warning: Removed 1 rows containing missing values (stat_smooth).
## Warning: Removed 1 rows containing missing values (stat_smooth).
## Warning: Removed 1 rows containing missing values (stat_smooth).
## Warning: Removed 1 rows containing missing values (stat_smooth).
## Warning: Removed 1 rows containing missing values (stat_smooth).
## Warning: Removed 1 rows containing missing values (stat_smooth).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
Pulling out linear regression
PB1 %>% group_by(sero, codon) %>% do(lm.result = lm((numcodon) ~ coryear, data=.)) -> fitted.models
intslope <- do(fitted.models, data.frame(sero=.$sero,codon=.$cod,
intercept=coef(.$lm.result)[1],
slope=coef(.$lm.result)[2]))
rm(fitted.models)
intslope
## Source: local data frame [192 x 4]
## Groups: <by row>
##
## sero codon intercept slope
## 1 H2N2 AAA 21.514716 0.217697346
## 2 H2N2 AAC 17.894182 -0.056736985
## 3 H2N2 AAG 30.546019 -0.262504253
## 4 H2N2 AAT 30.045509 0.259931099
## 5 H2N2 ACA 38.227714 -0.249489622
## 6 H2N2 ACC 10.034791 0.293573494
## 7 H2N2 ACG 2.964274 0.007974651
## 8 H2N2 ACT 8.976438 0.025348758
## 9 H2N2 AGA 28.184161 0.019160429
## 10 H2N2 AGC 13.557843 0.103160088
## .. ... ... ... ...
Nice! slopes and intercetps GOT! Next up: p values
Note by Claus: You should be able to obtain p values directly from the linear regression using summary(.$lm.result)$coef[4,2] in the above code.
PB1 %>% group_by(sero, codon) %>%
do(cor.res = cor.test(.$numcodon, .$coryear)) -> cors
## Warning in cor(x, y): the standard deviation is zero
## Warning in cor(x, y): the standard deviation is zero
## Warning in cor(x, y): the standard deviation is zero
## Warning in cor(x, y): the standard deviation is zero
corp <- do(cors, data.frame(sero=.$sero, codon=.$codon, cor=.$cor.res$estimate, p=.$cor.res$p.value))
rm(cors)
corp
## Source: local data frame [192 x 4]
## Groups: <by row>
##
## sero codon cor p
## 1 H2N2 AAA 0.7683672 0.000000e+00
## 2 H2N2 AAC -0.3202454 1.161447e-03
## 3 H2N2 AAG -0.7723004 5.053832e-21
## 4 H2N2 AAT 0.7522338 0.000000e+00
## 5 H2N2 ACA -0.8025226 1.039683e-23
## 6 H2N2 ACC 0.8240146 0.000000e+00
## 7 H2N2 ACG 0.1093708 2.787148e-01
## 8 H2N2 ACT 0.2918470 3.217027e-03
## 9 H2N2 AGA 0.1178618 2.428569e-01
## 10 H2N2 AGC 0.5003398 1.153826e-07
## .. ... ... ... ...
I mostly just copy-pasted the stats stuff. I think I understand them, but I’ll have to mess around with them a bit more.
Anyways, now ima dump that into a table with CAI and rtAI. I’m also including a table that shows the change in codon usage between my PB1 construct and the WT Udorn PB1.
codonstats <- select(corp, sero, codon, p) %>%
left_join(select(intslope, codon, sero, slope)) %>%
left_join(select(codonvalues, codon, withIFN, noIFN, CAI)) %>%
mutate(rtAI = withIFN/noIFN) %>%
left_join(read.csv(file='construct_v_WT_change.csv', head=TRUE, sep='\t'))
## Joining by: c("sero", "codon")
## Joining by: "codon"
## Joining by: "codon"
## Warning in left_join_impl(x, y, by$x, by$y): joining factors with different
## levels, coercing to character vector
Here are plots comparing slope, rtAI, and CAI. I’m expecting positive slopes for rtAI vs slope, negative for the rest I guess.
ggplot(codonstats, aes(x=rtAI, y=slope, label=codon)) +
geom_point() +
background_grid(major = "xy", minor = "none") +
geom_text(size = 5) +
facet_wrap(~sero)
ggplot(codonstats, aes(x=CAI, y=slope, label=codon)) +
geom_point() +
background_grid(major = "xy", minor = "none") +
geom_text(size = 5) +
facet_wrap(~sero)
ggplot(codonstats, aes(x=rtAI, y=CAI, label=codon)) +
geom_point() +
background_grid(major = "xy", minor = "none") +
geom_text(size = 5) +
facet_wrap(~sero)
Here are plots comparing the change between my older CAI construct and the WT with the rtAI, CAI and slope.
codonstats %>% filter(sero == "H3N2") %>%
ggplot(aes(x=change_con, y=rtAI, label=codon)) +
geom_point() +
background_grid(major = "xy", minor = "none") +
geom_text(size = 5)
## Warning: Removed 37 rows containing missing values (geom_point).
## Warning: Removed 37 rows containing missing values (geom_text).
codonstats %>% filter(sero == "H3N2") %>%
ggplot(aes(x=change_con, y=CAI, label=codon)) +
geom_point() +
background_grid(major = "xy", minor = "none") +
geom_text(size = 5)
## Warning: Removed 37 rows containing missing values (geom_point).
## Warning: Removed 37 rows containing missing values (geom_text).
codonstats %>% filter(sero == "H3N2") %>%
ggplot(aes(x=change_con, y=slope, label=codon)) +
geom_point() +
background_grid(major = "xy", minor = "none") +
geom_text(size = 5)
## Warning: Removed 37 rows containing missing values (geom_point).
## Warning: Removed 37 rows containing missing values (geom_text).
Here are plots comparing the change between my construct exactly replicating newer H3N2 codon usage and the WT with the rtAI, CAI and slope.
codonstats %>% filter(sero == "H3N2") %>%
ggplot(aes(x=change_new, y=rtAI, label=codon)) +
geom_point() +
background_grid(major = "xy", minor = "none") +
geom_text(size = 5)
## Warning: Removed 37 rows containing missing values (geom_point).
## Warning: Removed 37 rows containing missing values (geom_text).
codonstats %>% filter(sero == "H3N2") %>%
ggplot(aes(x=change_new, y=CAI, label=codon)) +
geom_point() +
background_grid(major = "xy", minor = "none") +
geom_text(size = 5)
## Warning: Removed 37 rows containing missing values (geom_point).
## Warning: Removed 37 rows containing missing values (geom_text).
codonstats %>% filter(sero == "H3N2") %>%
ggplot(aes(x=change_new, y=slope, label=codon)) +
geom_point() +
background_grid(major = "xy", minor = "none") +
geom_text(size = 5)
## Warning: Removed 37 rows containing missing values (geom_point).
## Warning: Removed 37 rows containing missing values (geom_text).
Comparing the change in codon usage from WT for both constructs.
codonstats %>% filter(sero == "H3N2") %>%
ggplot(aes(x=change_new, y=change_con, label=codon)) +
geom_point() +
background_grid(major = "xy", minor = "none") +
geom_text(size = 5)
## Warning: Removed 37 rows containing missing values (geom_point).
## Warning: Removed 37 rows containing missing values (geom_text).
I’m really not sure how to interpret this, though…